6 research outputs found

    Applying Data Mining Techniques Over Big Data

    Full text link
    With rapid development of information technology, data flows in different variety of formats - sensors data, tweets, photos, raw data, and unstructured data. Statistics show that there were 800,000 Petabytes stored in the world in 2000. Today Internet is about 1.8 Zettabytes (Zettabytes is 10^21), and this number will reach 35 Zettabytes by 2020. With that, data management systems are not able to scale to this huge amount of raw, unstructured data, which what is called today big data. In this present study, we show the basic concept and design of big data tools, algorthims [sic] and techniques. We compare the classical data mining algorithms with big data algorthims [sic] by using hadoop/MapReuce [sic] as the core implemention [sic] of big-data for scalable algorthims. [sic] We implemented K-means and A-priori algorthim [sic] by using Hadoop/MapReduce on 5 nodes cluster of hadoop. We also show their performance for Gigabytes of data. Finally, we explore NoSQL (Not Only SQL) databases for semi-structured, massively large-scale of data using MongoDB as an example. Then, we show the performance between HDFS (Hadoop Distributed File System) and MongoDB data stores for these two algorithms.This research work is part of a full scholarship fund of a Master degree through Minisrty of Higher Education and Scientific Research (MOHESR), Republic of Iraq (Fund 17004)

    A Parallel Interval-Based Constraint Language: Implementation and Performance Analysis

    No full text
    This paper presents the design and implementation of a non-deterministic constraint programming language based on interval variables (integer or floating point intervals) on shared-memory multiprocessing computers. The non-deterministic constructs in the language are the choice and split statements. Each processor narrows the constraints encountered along the path of the execution tree. Whenever a solution is found, the processor is re-used in exploring other branches of the execution tree. Similarly, a failed narrowing frees the corresponding processor to be re-used with a subsequent split or choice statements. A meta-level interpreter is presented to describe the execution behavior of the language. The behavior of the meta-level interpreter is also approximated by considering strings generated by a context-free grammar and derivable from the interpreter rules. A novel approach presented here is an attempt to determine the average case speed up by generating uniform random strings us..

    Data-Parallel Design of an Interval-Based Constraint Language: Analysis of Speed-Ups

    No full text
    This paper presents what is believed to be the very first design and implementation of a deterministic data-parallel constraint language based on interval variables (integer or floating point intervals). The approach used consists of having one constraint and one interval variable per processor. Each processor narrows and intersects the constraint it stores within its memory using interval variables which are stored in different processors. Whenever a solution is found, it is printed and the processor is re-used in implementing the splitting operation. Similarly, a failed narrowing or intersection frees the corresponding processor to be re-used in a subsequent split operation. The algorithm starts with an initial set of processors corresponding to the constraints to be solved and their variables. Narrowing and intersections are applied until convergence. Splitting is done by selecting the variable with the smallest interval and introducing a new set of processors whose data is copied ..

    Business Value of Information Technology: A Study of Electronic Data Interchange

    No full text
    A great deal of controversy exists about the impact of information technology on firm performance. While some authors have reported positive impacts, others have found negative or no impacts. This study focuses on Electronic Data Interchange (EDI) technology. Many of the problems in this line of research are overcome in this study by conducting a careful analysis of the performance data of the past decade gathered from the assembly centers of Chrysler Corporation. This study estimates the dollar benefits of improved information exchanges between Chrysler and its suppliers that result from using EDI. After controlling for variations in operational complexity arising from mix, volume, parts complexity, model, and engineering changes, the savings per vehicle that result from improved information exchanges are estimated to be about 60.Includingtheadditionalsavingsfromelectronicdocumentpreparationandtransmission,thetotalbenefitsofEDIpervehicleamounttoover60. Including the additional savings from electronic document preparation and transmission, the total benefits of EDI per vehicle amount to over 100. System wide, this translates to annual savings of $220 million for the company
    corecore